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Abstract. We motivate and introduce the basis for a query language de- 
signed for inspecting electronic representations of proofs. We argue that 
there is much to learn from large proofs beyond their validity, and that a 
dedicated query language can provide a principled way of implementing 
a family of useful operations. 


1 Motivation 

Increasingly, automated proof tools and interactive theorem provers are called 
upon to produce evidence of their claims, in the form of representations of proofs 
that may be independently checked or, perhaps, imported into another system. 
These electronic proofs must connect together atomic rules of inference and 
axioms in a sound way according to an underlying logic. Checking that this has 
been done correctly is straightforward, although producing the proofs in the first 
place may have been extraordinarily difficult. 

Real proofs can be very large, perhaps consisting of tens or hundreds of 
thousands of atomic rules of inference. There are many things that are interest- 
ing to know about such objects, beyond the basic fact that they are correctly 
constructed. For example, some natural questions when inspecting a proof are: 

— What is the high-level structure of this proof, (how) can we break it down 
into pieces to understand it? 

— Given a proof of a property which exploits a set of domain-specific axioms, 
which axioms actually occurred in the proof? (Or, in a purely logical setting, 
does a proof rely on axioms of classical logic?) 

— Given a problem statement which contains some existential propositions as 
sub-formulae, which, if any, witnesses were found to make them true? 

— Does a large proof contain duplicated parts that could be abstracted (or 
generalised) into a separate lemma, using a cut-like rule to reduce the size 
of the proof? 
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When the user is trying to understand the proof construction process, partic- 
ularly if using interactive or semi-interactive provers, there are further natural 
questions which relate the constructed proof back to the procedures that pro- 
duced it. For example, if tactics are our notion of proof producing procedure, 
there are natural questions to ask which relate the proof and the tactics that 
produced it: 

— Given a set of tactics and a proof, which tactics were invoked in producing 
the proof and how were they invoked (i.e., which subgoals were solved by 
which tactics)? 

— Were any tactics used repeatedly in this proof, perhaps with similar or iden- 
tical inputs? 

— Did some tactics get invoked but do no useful work? 

— Given a failed proof (represented as a proof with unproved portions), which 
tactics were tried on the unproved portions? 

These kind of questions are not idle curiosities: we believe that they are 
useful for practical proof engineering , when managing and maintaining sets of 
properties, proofs and programs which check and create them. For example, one 
of us (Denney) routinely resorts to low-level scripted tools to perform these kind 
of examinations when building large safety cases supported by formal proofs]^] 
One can approach this in a more principled manner with the hope of enabling 
more general tools with clear foundations. In this short paper, we introduce some 
first ideas for a query language designed for querying proofs. 

Hierarchical proofs. We build on the foundation of Hiproofs [8|'2] . which pro- 
vides a simple abstract notion of proof tree. Hiproofs represent proof trees by 
composing atomic rules of inference from an unspecified underlying logic. Going 
beyond ordinary trees, they have a notion of hierarchy , provided by a way to nest 
and label a subtree. This simple addition provides a precise and useful notion 
of structure in the proof which can be used, for example, for noting where a 
lemma was applied, or where a particular tactic or external proof tool produced 
a subtree. 

Sub-proof labelling, when it is present in a proof, immediately allows us 
to address the first questions above concerning overall proof structure and the 
application points of tactics. Subtrees provide structuring that can give hints 
to understanding the constructed proof object. Labels act as reference points to 
connect back to the proof-producing program. Note, however, that labels are not 
enough to completely capture the story of how a proof was produced since they 
only record success points, not points where some sub-procedure was attempted 
but failed to produce a proof. So some forms of query may refer to a proof and 
its construction procedure together, or, equivalently, return results by querying 
the search tree that was explored during its construction. 

4 Of course, manipulating proof objects to change them is also interesting, although 
it might sometimes be better done on the input to systems that constructs proofs. 
In this preliminary study we restrict ourselves to queries which return pieces of the 
queried objects, without further manipulation. 
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In practice, of course, practical proof tools already have mechanisms to allow 
these sort of features. For example, externally invoked procedures may have their 
results (and perhaps justifications) grafted into an overall proof, but at least 
recording that they were applied [9]. Noteworthy sub-trees may be represented 
using names for reference (and then shared to create a dag structure) as in 
TPTP [T2]. . In many systems, switches may be used to turn on debugging 
output for proof procedures to create a lengthy log, which explains where things 
were tried and failed. 


2 Hiproofs 


Hiproofs add structure to an underlying derivation system , a simple form of 
logical framework. We give a brief recap here, the reader is referred to 1218 ) for 
full details. 

A hiproof is built from (inverted) atomic inference rules a in the under- 
lying derivation system: it maps input goals [7i,-..,7 n ] to output subgoals 
[7x, - - - ,7m]- A nested hiproof, appearing immediately inside a labelled box, al- 
ways has a single input goal which is the root of the tree at that level. 

Informally and graphically, we draw hiproofs as inverted trees with a nested 
structure. Denotationally, a hiproof can be understood as a pair of a tree and a 
forest with the same set of nodes, subject to some well-formedness conditions. 
Syntactically, a hiproof can be written as a term s in the grammar below: 


s 


a 

id 


[l\s 


s i ; s 2 


si (g> s 2 

0 


atomic 

identity 

labelling 

sequencing 

tensor (juxtaposition) 
empty 


(1) 


Fig. a shows the graphic representation of an example hiproof and its term 
equivalent. Boxes indicate nestings and have labels in their top corners; un- 
labelled boxes contain atomic rules. Tensor places things side-by-side and se- 
quencing builds “wiring” to connect things together, using identity to create 


([1] a ; b ® id) ; [m] c 


1 

b 

a 

/ 





m 

c 



Fig. 1 . A hiproof and its graphical representation. 
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wires where a goal is not manipulated. In the example, id exports the second 
subgoal from the atomic rule a outside the box labelled l. 

Validation. A hiproof is called valid if it corresponds to a real proof tree in the 
underlying derivation system. For example, the hiproof in Fig [l] validates the 
proof tree: 



7i 


where we have some input and output goals that can be proved with the atomic 
inference rules a, b and c. We write s h ji — ► g 2 if s is a valid hiproof that 
takes input goals g\ and produces output goals < 72 - 

A valid hiproof can be seen, then, as a nested labelling applied to a flat 
proof. In [2] we introduced a kernel tactic language which extends hiproofs with 
the well-known procedural tactic mechanisms for computing proofs: recursion 
for repetition, alternation for trying one thing or backtracking to another, and 
testing subgoals to introduce decision points. In El this is taken further by 
providing a declarative tactic language. 

3 Queries 

One design option would be to take an existing query language for graph (or semi- 
structured) data models (e.g., see [I] for models and [3] for web query languages), 
and then map from hiproofs into the existing language and use queries there. 
We prefer instead to start from queries written in a native query language closer 
to hiproofs, and give a direct semantics for them. This gives us a clearer idea of 
what queries we need and helps keeps the semantics precise; to establish bounds 
on performance (or perhaps for practical implementation) we may consider a 
translations as secondary. 

To begin with, we want a simple query to be able to inspect and return parts 
of a hiproof. We defer relating proofs to their production mechanisms, the second 
category of examples in the introduction, for later. Thus queries may return of 
atomic rule names a, labels l, or sub-hiproofs s. These will be selected by paths 
that match the hiproof tree and pick out pieces. Queries are then constructed 
by generating sets of paths using path expressions, and filtering with simple 
propositions to select those of interest. 

3.1 Paths 

We use hiproof constructors to build up paths. A path navigates down through 
the structure, choosing left and right branches of tensors, and entering boxes, 
until hitting a chosen point. So the hiproof constructors themselves can serve as 
labels. 

Definition 1 (Path). A path is denoted as follows: 

V "=• I [-] P I P®~ I I P',~ I -;p 
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A path selects a part of a hiproof (Definition [3] below) , if the shape of the 
hiproof fits with the path; in this case we say that the path is well-formed. 

Definition 2. Given a hiproof s, the set P(s) of all paths well-formed wrt s is 
defined recursively as follows: 


P(«) = {.} 

P(id) = {} 

P(M s ) = {•} u {[-] q\q& P(s)} 

P(si <g> s 2 ) = {•} U {p (g> - | p g P(si)} U {- <g> p | p £ P(s 2 )} 
P(si ; s 2 ) = {•} U {p ; - I p G P(si)} U{-;p\p£ P(s 2 )} 


A well-formed path selects a sub-hiproof of a given hiproof, called its target , 
defined in the obvious way as follows: 

Definition 3 (Selection). For a hiproof s and a path p £ P(s), the target of 
p is defined as a selection of s as follows: 

sel(», s) = s 

sel{[-\p, [ l ] s) = sel{p , s) 
sel(p <g> — , si (g> s 2 ) = sel(p , s i) 
sel(- ® p, Si <3 s 2 ) = sel(p, s 2 ) 
sel(p ; si ; s 2 ) = sel(p, si) 
sel{- ; p, Sr ; s 2 ) = sel(p , s 2 ) 


A simple but worthwhile observation is that selection preserves validity. 

Lemma 1 (Validity Preservation). Given a validated hiproofs b g — >i g 2 , 
for all p£P(s), there are goals h\, h 2 such that sel(p , s) b hi — > h 2 . 

This is proven by inspecting the validation of the hiproof. Given a validation, 
we can extend sel(p , s) to return the concrete lists of goals h± and h 2 discharged 
and recharged by s. In particular, this allows us to inspect subgoals inside the 
proof, or check the arity of a sub-proof or atomic rule. An atomic rule a has an 
input arity n given by its number of premises, written a : n. Axioms have zero 
input, so a : 0 says that a is an axiom. 

Operations and propositions on paths give us the path algebra. 

Definition 4 (Path concatenation). For two paths p and q, their concatena- 
tion p +b q is defined in the obvious way: 

• -H- q = q 

l~]p J t+q= H ( p -H- q) 
p<&--\+q={p-\+q)®~ 
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-®P+fg = -®(j)+l-g) 
p;-+fg = (p+f(?);- 

-;p+t - q = - \{p +\- q) 


Concatenation is associative and has the empty path • as left and right unit. 


3.2 Queries 

A query is an operation which selects (interesting) pieces of a hiproof, given by 
one or more paths. Queries are built using comprehension schemes of first-order 
logic over an algebra of schemes and paths. 

To be precise, let VarA , Varp , Vars and Varp be disjoint, countably infinite 
sets of variables for atomics, labels, hiproofs and paths, respectively, ranged over 
by the indicated capital letters. The hiproof expressions are hiproofs built over 
atomic, label and hiproof variables, using the operations in Q and the selection 
operation from Def. [3] above. The path expressions are built using paths, path 
variables and the path operation -H-. The path propositions are expressions of 
first-order logic over equations between path expressions, hiproof expressions, or 
atomic propositions that constrain goals or atomic rules. 

An example of an expressible useful derived property is the prefix ordering 
between paths: 

p < q 3 r.p-{+r = q. 

A simple query is a set comprehension scheme 

{P € P(s) | cf(P)} 

where s is the hiproof to query and <f>(P) is a path proposition selecting the 
interesting paths. Most of our queries return atomic tactics or labels, though, so 
we allow the following extensions. For paths to return atomic tactics, we have 

{A | P £ P(s), <p{A, P , s)} = { sel(P , s) | P € P(s) A 3 A.sel{P : s) = A A <f(A , s)} 

where cf>(A, P, s) is a proposition over an atomic tactic A, a path P and the 
hiproof s. Thus, we can write a query which returns a set of atomic tactics 
which is a shortcut for a query which returns a set of paths guaranteed to select 
an atomic tactic. 

For the examples we give below, we constrain atomic rules by giving a subset 
we want to choose (for example, those that are axioms or those that prove 
existential statements). Other examples examine concrete goals that appear in 
the proof (hiproof validation) . In the Hitac tactic language [5] we used a matching 
relation assert as an abstract constraint on goals 7 , similarly. 
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3.3 Example queries 

Finally we illustrate our ideas with a few examples that show how to answer 
some of the questions posed in Section [1] 

— To find all axioms in a valid hiproof s: 

Axioms(s) = {A \ P £ P(s). se/(P, s) = iAi:0}. 

— To find existential witnesses inside a valid hiproof s, we suppose that the 
introduction rule for the existential exi is a set of atomic tactics exj = 
{exi t }teT indexed by a set of terms T (witnesses) in the underlying logic. 
The witness query returns the instantiated existential rules: 

Wit(s) = {A | P £ P(a). sel(P, s) = A /\ A £ exi} 

— Which goals are input to (or output from) a tactic called tac? 

Input{ tac, s) = {g \ P £ P(s). 3Si.sel(P, s) = [tac] Si A Si F g — > h} 

Output(tac, s) = {h | P £ P(s). 3Si.sel(P, s ) = [tac] Si A Si I- g — » h} 

— Which tactics calls themselves recursively? Note how this query has three 
generating expressions P G P(s) and Q £ P(s): 

Rec{s) ={L\ P e P(s), Q £ P(s). P <Q A 

3Si. sel(P , s) = [L\ Si A 3S 2 . ael{Q, s) = [L\ S 2 } 

This returns labels l which label subtrees that contain the same label l again. 

— Which tactic uses atomic tactic a, i.e., inside which label does a occur? This 
query returns all labels L which contain a directly, i.e., there are no other 
labels inside boxes containing labels in L. 

Inside(a , s) = {L \ P £ P(s), Q £ P(s), R £ P(s). 

P < Q A P < R A R < Q => 

3Si. sel(P, s) = [L] Si A sel{Q , s) = a A 
-G M, S 2 . sel(R, s ) = [M\ S 2 }. 


4 Future Work 

This brief paper introduces some of our ideas for proof query languages. Much 
remains to be done: we plan to first complete our study of the semantics for 
the query constructs, and then to introduce a more user-friendly language for 
actually writing queries, using the above comprehension schemas to give their 
denotation. Then we need to give an account of how queries are evaluated: this 
might be with a direct operational interpretation, or via an auxiliary mechanism. 
Further out, we want to set this work in the context of related query languages, 
perhaps by translations as suggested above. See, e.g., [T] for some expressivity 
and complexity results. 
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Meanwhile, we are also keen to explore moving the hiproof formalisation 
closer to usable implementations; not to replace incumbent systems with their 
large machinery and proof libraries, but to serve as an experimental platform 
for studying proof languages more precisely. See m for an example in this di- 
rection, describing a declarative language for hiproofs and also some refactoring 
operations to model changes undertaken in real proof developments. Such refac- 
torings cause input changes to proof tools that don’t change statements being 
proved, but may alter resultant proof objects or their structure. 

Related work. We believe that the idea of a dedicated query language for inspect- 
ing proofs is novel, although there are some related investigations on particular 
ways of exploiting proofs. These include, for example, efforts to translate proofs 
between systems [7]; ways to discover dependencies between parts of proofs m 
to help simplify or rearrange; and ways to mine proofs to discover common pat- 
terns jl3j . Away from theorem proving, query languages have been introduced for 
other forms of structured data, including semi-structured (XML- like) models [5j , 
and programs or their intermediate forms during compilation ECU]. 
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